Application of high-performance computing to the reconstruction, analysis, and optimization of genome-scale metabolic models

نویسندگان

  • Christopher S Henry
  • Fangfang Xia
  • Rick Stevens
چکیده

Over the past decade, genome-scale metabolic models have gained widespread acceptance in biology and bioengineering as a means of quantitatively predicting organism behavior based on the stoichiometry of the biochemical reactions constituting the organism metabolism. The list of applications for these models is rapidly growing; they have been used to identify essential genes, determine growth conditions, predict phenotypes, predict response to mutation, and study the impact of transcriptional regulation on organism phenotypes. This growing field of applications, combined with the rapidly growing number of available genomescale models, is producing a significant demand for computation to analyze these models. Here we discuss how high-performance computing may be applied with various algorithms for the reconstruction, analysis, and optimization of genome-scale metabolic models. We also performed a case study to demonstrate how the algorithm for simulating gene knockouts scales when run on up to 65,536 processors on Blue Gene/P. In this case study, the knockout of every possible combination of one, two, three, and four genes was simulated in the iBsu1103 genome-scale model of B. subtilis. In 162 minutes, 18,243,776,054 knockouts were simulated on 65,536 processors, revealing 288 essential single knockouts, 78 essential double knockouts, 99 essential triple knockouts, and only 28 essential quadruple knockouts. Introduction Genome-scale metabolic models represent one important end product of the genome annotation process. These models provide a means of rapidly translating detailed knowledge of thousands of enzymatic processes into quantitative predictions of whole-cell behavior. They have been applied extensively to identify essential genes and genes sets, predict organism phenotypes and growth conditions, design metabolic engineering strategies, and simulate the effects of transcriptional regulation on organism behavior [1-4]. A genome-scale metabolic model of an organism consists of three primary components: (1) a list of the reactions that take part in the organism metabolism including data on reaction stoichiometry and reversibility, (2) a set of gene-protein-reaction (GPR) mappings that capture how genes in the organism encode enzymes and how these enzymes catalyze metabolic reactions, and (3) a biomass objective function that indicates which small molecules must be produced for an organism to grow and SciDAC 2009 IOP Publishing Journal of Physics: Conference Series 180 (2009) 012025 doi:10.1088/1742-6596/180/1/012025 c © 2009 IOP Publishing Ltd 1 divide [1]. All of these components are used in a method called flux balance analysis (FBA) to simulate organism metabolism in a set of specified environmental conditions. FBA involves the use of linear optimization to define the limits on the metabolic capabilities of a model organism by assuming that the interior of the cell exists in a quasi-steady state [2-5]. This quasisteady-state assumption is enforced by a set of linear mass balance constraints written for each metabolite included in the model. N·v = 0 (1) In the mass balance constraints (Eq. 1), N is the m x r matrix of the stoichiometric coefficients for the r reactions and m metabolites in the genome-scale metabolic model, and v is the r x 1 vector of the steady-state fluxes through the r reactions in the model. Bounds are placed on the reaction fluxes (v) depending on the reversibility of the reactions. -100 mMol/gm CDW hr ≤ vi,reversible ≤ 100 mMol/gm CDW hr (2) 0.0 mMol/gm CDW hr ≤ vi,irreversible ≤ 100 mMol/gm CDW hr (3) These mass balance constraints and reaction flux bounds form a set of underdetermined linear equations with many possible solutions. Because these equations are underdetermined, an optimization criterion is used to capture the most physiologically relevant region of the solution space. The optimization criteria vary depending on the application, but the most common criterion is the maximization of growth yield [5, 6]. Maximum growth yield is simulated by maximizing the flux through the biomass reaction in the model while the uptake of nutrients is fixed at a specific ratio. This is a meaningful optimization criterion because organisms have been observed to grow at the maximum predicted yield when nutrients are plentiful [7]. The FBA formulation described here forms the core of many different algorithms for the reconstruction, analysis, and optimization of genome-scale metabolic models. For this reason, software developed to run FBA in a high-performance computing (HPC) environment may be directly applied to solving many different problems. FBA is also an attractive algorithm to run with HPC because only a small amount of data is required to fully define the variables, constraints, and objective necessary to simulate even the largest genome-scale models, and once FBA has been run, the only result that must be returned is the objective and the list of reaction fluxes. Thus FBA requires minimal input and output, thereby improving scalability. Additionally, once the constraints and variables defining the FBA mass balances for a single genome-scale metabolic model have been initially loaded, many different simulations can be performed simply by adjusting the bounds on the loaded variables and repeating the optimizations. Thus, much work can be done without loading additional data, further improving scalability. Here we describe the various FBA-based algorithms that currently exist for the reconstruction, analysis, and optimization of genome-scale metabolic models. We focus in particular on algorithms that are good candidates for processing in HPC, and we discuss the implications of running these algorithms in parallel. As a case study, we implemented the FBA algorithm for gene knockout simulation using MPI on Blue Gene/P, and we used this algorithm to simulate the 10 possible quadruple gene knockouts for the iBsu1103 genome-scale model of B. subtilis. This represents the first time that the essential combinations of up to four genes have been identified in B. subtilis. 1. Model reconstruction algorithms In order to run FBA simulations in HPC for a large variety of different organism, genome-scale metabolic models must first be created for those organisms. Fortunately opportunities now exist for the use of HPC to accelerate this process. Reconstruction is the word used to describe the process of producing a functioning genome-scale model of an organism starting from the sequence of nucleotides in the organism’s genome. This process includes gene calling, annotation, literature data mining, reaction mapping, biomass objective function assembly, error correction, and gap filling. Until recently, this process required years of manual effort to complete, and as a result, the rate of development of new genome-scale models lagged far behind the rate at which new genomes were being sequenced (figure 1) [8]. Fortunately, methods have recently emerged that expedite many of the SciDAC 2009 IOP Publishing Journal of Physics: Conference Series 180 (2009) 012025 doi:10.1088/1742-6596/180/1/012025

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Soft Computing Methods for the Estimation of Roadheader Performance from Schmidt Hammer Rebound Values

Estimation of roadheader performance is one of the main topics in determining the economics of underground excavation projects. The poor performance estimation of roadheader scan leads to costly contractual claims. In this paper, the application of soft computing methods for data analysis called adaptive neuro-fuzzy inference system- subtractive clustering method (ANFIS-SCM) and artificial  neu...

متن کامل

Genome-Scale Metabolic Network Models of Bacillus Species Suggest that Model Improvement is Necessary for Biotechnological Applications

Background: A genome-scale metabolic network model (GEM) is a mathematical representation of an organism’s metabolism. Today, GEMs are popular tools for computationally simulating the biotechnological processes and for predicting biochemical properties of (engineered) strains.Objectives: In the present study, we have evaluated the predictive power of two ...

متن کامل

High-throughput generation and optimization of genome-scale metabolic models

2 Genome-scale metabolic models have proven to be crucial resources for translating detailed knowledge of thousands of distinct biochemical processes into global predictions of organism behavior. These models can be used to predict essential genes, organism phenotypes, organism response to mutations, and metabolic engineering strategies [1]. The models also serve as platforms for assessing and ...

متن کامل

Investigation on metabolism of cisplatin resistant ovarian cancer using a genome scale metabolic model and microarray data

Objective(s): Many cancer cells show significant resistance to drugs that kill drug sensitive cancer cells and non-tumor cells and such resistance might be a consequence of the difference in metabolism. Therefore, studying the metabolism of drug resistant cancer cells and comparison with drug sensitive and normal cell lines is the objective of this research. Material and Methods:Metabolism of c...

متن کامل

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009